39 research outputs found

    An analysis of students’ summaries using summary sentence decomposition

    Get PDF

    Stemming Hausa text: using affix-stripping rules and reference look-up

    Get PDF
    Stemming is a process of reducing a derivational or inflectional word to its root or stem by stripping all its affixes. It is been used in applications such as information retrieval, machine translation, and text summarization, as their pre-processing step to increase efficiency. Currently, there are a few stemming algorithms which have been developed for languages such as English, Arabic, Turkish, Malay and Amharic. Unfortunately, no algorithm has been used to stem text in Hausa, a Chadic language spoken in West Africa. To address this need, we propose stemming Hausa text using affix-stripping rules and reference lookup. We stemmed Hausa text, using 78 affix stripping rules applied in 4 steps and a reference look-up consisting of 1500 Hausa root words. The over-stemming index, under-stemming index, stemmer weight, word stemmed factor, correctly stemmed words factor and average words conflation factor were calculated to determine the effect of reference look-up on the strength and accuracy of the stemmer. It was observed that reference look-up aided in reducing both over-stemming and under-stemming errors, increased accuracy and has a tendency to reduce the strength of an affix stripping stemmer. The rationality behind the approach used is discussed and directions for future research are identified

    Sentiment analysis of noisy Malay text: State of art, challenges and future work

    Get PDF
    Sentiment analysis (SA) is a study where people's opinions and emotions are automatically extracted in the form of sentiments from the natural language text. In social media monitoring, it is very useful because it allows user to gain an overall picture of the extensive public opinion behind many topics. Most works on SA are for the English text. Only a few works focus on the Malay language. Currently, a review on SA for the Malay language only focus on the SA approaches and the dataset. Some major issues such as the pre-processing techniques used to normalize the noisy text, the most employed performance measures for Malay SA, and the challenges for Malay SA has not been reviewed. Malaysians tend not to fully follow any abbreviations rules when writing on social media. Thus, a lot of noisy text can be found in social media sites like Facebook and Twitter which create some issues to SA process. Hence, the aim of this study is to investigate the state of the art, challenges and future works of SA for Malay social media text. This study provides a review on various approaches, datasets, performance measures, and pre-processing techniques used in the previous works on SA of the Malay text. More than 700 articles from journals and conference proceedings have been identified using the search keywords, however, only 17 relevant articles published from year 2013 to 2018 were reviewed. The findings from this review focus on three commonly used SA approaches which are lexicon-based, machine learning, and hybrid

    Identifying Students' Summary Writing Strategies Using Summary Sentence Decomposition Algorithm

    Get PDF
    The Summary writing is one of the important skills taught in schools. A summary is a condensed version of an existing text. Its production differs from other types of writing where it requires the use of specific strategies. Most research on summary assessments focused on the end product of summary writing instead of its process. Research has shown that lack of strategic skills is a cause of students' difficulties in writing good summaries. There are a few systems available to assist teachers in assessing students summaries based on content and style. But virtually none have been developed to assess the process particularly in identifying the strategies used. To address this need, we propose an algorithm based on summary sentence decomposition to identify students' strategies in summary writing. We first analyzed experts' written summaries, extracted the strategies used in the summaries, formulated a set of heuristics rules to define the strategies and finally transformed the rules using position-based method into summary sentence decomposition algorithm (SSDA). For evaluation, we measured the algorithm's functionality in identifying the different strategies. We also compared its performance against human experts. The results based on 168 summary sentences indicate that the algorithm successfully identified these syntax level strategies: deletion, sentence combination, copy-paste, syntactic transformation and sentence reordering. In comparison to human performance, the algorithm's performance closely matched that of human with 94 accuracy in identifying the syntax level strategies. For future work, the algorithm will be extended to identify the semantic level strategies, diagnose the strategies used and provide constructive feedback

    Toward Tweets Normalization Using Maximum Entropy

    Get PDF
    Abstract The use of social network services and microblogs, such as Twitter, has created valuable text resources, which contain extremely noisy text. Twitter messages contain so much noise that it is difficult to use them in natural language processing tasks. This paper presents a new approach using the maximum entropy model for normalizing Tweets. The proposed approach addresses words that are unseen in the training phase. Although the maximum entropy needs a training dataset to adjust its parameters, the proposed approach can normalize unseen data in the training set. The principle of maximum entropy emphasizes incorporating the available features into a uniform model. First, we generate a set of normalized candidates for each out-ofvocabulary word based on lexical, phonemic, and morphophonemic similarities. Then, three different probability scores are calculated for each candidate using positional indexing, a dependency-based frequency feature and a language model. After the optimal values of the model parameters are obtained in a training phase, the model can calculate the final probability value for candidates. The approach achieved an 83.12 BLEU score in testing using 2,000 Tweets. Our experimental results show that the maximum entropy approach significantly outperforms previous well-known normalization approaches

    Feel-It: An Intelligent Secondary School Physics Q&A System

    Get PDF
    Feel-It is an intelligent ‘questioning and answering system’ for secondary school Physics. The system is created to help solving open-ended Physics problems, as well as providing adaptive guidance and giving relevant learning resources from the Internet according to users’ queries. The proposed architecture for Feel-It constitutes of four basic modules: data extraction, question classification, solution identification and answer formulation. The data extraction module builds the Physics knowledge base. The question classification module identifies and analyses the question. The solution identification module solves Physics questions and selects the top n most relevant resource references. The last module, answer formulation, arranges and compiles the result as system output. Our preliminary result has shown that the system is able to produce correct answers up to 60 % of accuracy

    Utilizing decision tree machine model to map dental students’ preferred learning styles with suitable instructional strategies

    Get PDF
    Background Growing demand for student-centered learning (SCL) has been observed in higher education settings including dentistry. However, application of SCL in dental education is limited. Hence, this study aimed to facilitate SCL application in dentistry utilising a decision tree machine learning (ML) technique to map dental students’ preferred learning styles (LS) with suitable instructional strategies (IS) as a promising approach to develop an IS recommender tool for dental students. Methods A total of 255 dental students in Universiti Malaya completed the modified Index of Learning Styles (m-ILS) questionnaire containing 44 items which classified them into their respective LS. The collected data, referred to as dataset, was used in a decision tree supervised learning to automate the mapping of students' learning styles with the most suitable IS. The accuracy of the ML-empowered IS recommender tool was then evaluated. Results The application of a decision tree model in the automation process of the mapping between LS (input) and IS (target output) was able to instantly generate the list of suitable instructional strategies for each dental student. The IS recommender tool demonstrated perfect precision and recall for overall model accuracy, suggesting a good sensitivity and specificity in mapping LS with IS. Conclusion The decision tree ML empowered IS recommender tool was proven to be accurate at matching dental students’ learning styles with the relevant instructional strategies. This tool provides a workable path to planning student-centered lessons or modules that potentially will enhance the learning experience of the students

    Improved sine cosine algorithm with simulated annealing and singer chaotic map for Hadith classification

    Get PDF
    Feature selection (FS) represents an important task in classification. Hadith represents an example in which we can apply FS on it. Hadiths are the second major source of Islam after the Quran. Thousands of Hadiths are available in Islam, and these Hadiths are grouped into a number of classes. In the literature, there are many studies conducted for Hadiths classification. Sine Cosine Algorithm (SCA) is a new metaheuristic optimization algorithm. SCA algorithm is mainly based on exploring the search space using sine and cosine mathematical formulas to find the optimal solution. However, SCA, like other Optimization Algorithm (OA), suffers from the problem of local optima and solution diversity. In this paper, to overcome SCA problems and use it for the FS problem, two major improvements were introduced to the standard SCA algorithm. The first improvement includes the use of singer chaotic map within SCA to improve solutions diversity. The second improvement includes the use of the Simulated Annealing (SA) algorithm as a local search operator within SCA to improve its exploitation. In addition, the Gini Index (GI) is used to filter the resulted selected features to reduce the number of features to be explored by SCA. Furthermore, three new Hadith datasets were created. To evaluate the proposed Improved SCA (ISCA), the new three Hadiths datasets were used in our experiments. Furthermore, to confirm the generality of ISCA, we also applied it on 14 benchmark datasets from the UCI repository. The ISCA results were compared with the original SCA and the state-of-the-art algorithms such as Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Grasshopper Optimization Algorithm (GOA), and the most recent optimization algorithm, Harris Hawks Optimizer (HHO). The obtained results confirm the clear outperformance of ISCA in comparison with other optimization algorithms and Hadith classification baseline works. From the obtained results, it is inferred that ISCA can simultaneously improve the classification accuracy while it selects the most informative features

    QAPD: an ontology-based question answering system in the physics domain

    No full text
    The tremendous development in information technology led to an explosion of data and motivated the need for powerful yet efficient strategies for knowledge discovery. Question answering (QA) systems made it possible to ask questions and retrieve answers using natural language queries. In ontology-based QA system, the knowledge-based data, where the answers are sought, have a structured organization. The question-answer retrieval of ontology knowledge base provides a convenient way to obtain knowledge for use. In this paper, QAPD, an ontology-based QA system applied to the physics domain, which integrates natural language processing, ontologies and information retrieval technologies to provide informative information for users, is presented. This system allows users to retrieve information from formal ontologies using input queries formulated in natural language. We proposed inferring schema mapping method, which uses the combination of semantic and syntactic information, and attribute-based inference to transform users’ questions into ontological knowledge base query. In addition, a novel domain ontology for physics domain, called EAEONT, is presented. Relevant standards and regulations have been utilized extensively during the ontology building process. The original characteristic of system is the strategy used to fill the gap between users’ expressiveness and formal knowledge representation. This system has been developed and tested on the English language and using an ontology modeling the physics domain. The performance level achieved enables the use of the system in real environments
    corecore